Skip to main content

Handling of Missing Data

Purpose

This document describes how missing data is handled in user queries, specifically after the requested data is found, but before it is sorted and before values requested in the query are extracted from the data.

Summary

Missing data is interpreted as (i) present and (ii) having unset values.

Specification

Below, we refer to object's "properties". A property can be viewed as a key:value pair. Keys are strings. Values have various types. Objects can have multiple properties. For the purpose of explaining the policy, it is instructive to imagine that object's properties are stored in an associative container; property value and can be extract using a lookup by property key using a string.

Below, we use words 'missing' and 'absent' interchangeably.

The policy for handling missing data is as follows:

  • missing property values are interpreted as unset, which correspond to null in Json and None in Python;
  • missing property keys are interpreted as present, and the corresponding values as unset;
  • when comparing property values during sorting, missing values compare to be less than non-missing; and
  • property keys that are missing are never inserted, this avoiding path-dependency (or hysteresis) in query results.

As a technical remark, we note that no distinction is drawn between property keys that are (a) absent in some objects but present in other objects, and (b) absent from the database.

Example 1

Let us try to retrieve properties that were not added to certain entities. Let us first insert some entities:

Query:

[{
"AddEntity": {
"class": "Apple_3",
"properties": {
"color": "green",
"type": "granny_smith",
"origin": "Allen, Neuquen, Argentina"
}
}
}, {
"AddEntity": {
"class": "Apple_3",
"properties": {
"color": "red",
"type": "granny_smith"
}
}
}]

Note that the second entity does not have "origin" property set.

Response:

[{
"AddEntity": {
"status": 0
}
}, {
"AddEntity": {
"status": 0
}
}]

Now, let us try to retrieve all the "origin" property for all entities:

Query:

[{
"FindEntity": {
"with_class": "Apple_3",
"results": {
"list": ["color", "type", "origin"]
}
}
}]

Response:

[{
"FindEntity": {
"entities": [{
"color": "green",
"origin": "Allen, Neuquen, Argentina",
"type": "granny_smith"
}, {
"color": "red",
"origin": null,
"type": "granny_smith"
}],
"returned": 2,
"status": 0
}
}]

Because the property "origin" is not set for the red apple, its value will be null in the response.

Example 2

Let us use an unrelated field for sorting the results. We look for images of apples, list some of their properties (as key:value pairs) and sort the images by intelligence.

Query:

[{
"FindEntity": {
"with_class": "Apple_1",
"results": {
"list": ["color", "type", "IQ"],
"sort": "IQ"
}
}
}]

Response:

[{
"FindEntity": {
"entities": [{
"IQ": null,
"color": "red",
"type": "granny_smith"
}, {
"IQ": null,
"color": "green",
"type": "granny_smith"
}],
"returned": 2,
"status": 0
}
}]

Whether using IQ as a sorting field is a mistake or not, is left for the users to decide. If IQ==null is not what they expect, then this is how the users know they made a mistake. In case the users do not output the IQ property, the result will be

Query:

[{
"FindEntity": {
"with_class": "Apple_2",
"results": {
"list": ["color", "type"],
"sort": "IQ"
}
}
}]

Response:

[{
"FindEntity": {
"entities": [{
"color": "red",
"type": "granny_smith"
}, {
"color": "green",
"type": "granny_smith"
}],
"returned": 2,
"status": 0
}
}]

Example 3

Let us try to retrieve entities where a property is unset.

[{
"FindEntity": {
"with_class": "Apple_5",
"constraints": {
"origin": ["==", null]
},
"results": {
"list": ["color", "type", "origin"]
}
}
}]

Response:

[{
"FindEntity": {
"entities": [{
"color": "red",
"origin": null,
"type": "granny_smith"
}],
"returned": 1,
"status": 0
}
}]